Skip to content

Disaster Recovery

Last updated: 2026-04-10


Scenario 1 — LXC is broken, snapshot revert works

Symptoms: Service is down, container won't start, config is corrupted.

Recovery time: ~2 minutes

# From Proxmox host
pct stop 101                    # or 102
pct rollback 101 phase1-complete
pct start 101

# Verify
pct enter 101
cd /opt/edge-gateway
docker compose ps               # All services should be Up

Note

After rollback, any changes made since the snapshot are lost. This includes n8n workflows created after the snapshot, NPM proxy host changes, and firewall rule edits.


Scenario 2 — LXC is broken, no usable snapshot

Symptoms: Snapshot is also corrupted, or snapshot was never taken.

Recovery time: ~30 minutes per container

Rebuild edge-gateway (CT 101)

  1. Delete the broken container:

    pct stop 101
    pct destroy 101
    

  2. Recreate from scratch following Phase 1 Implementation Guide.md Steps 3.1–3.4

  3. Restore config files:

  4. /opt/edge-gateway/docker-compose.yml — copy from Implementation Guide Step 3.4
  5. /opt/edge-gateway/.env — retrieve TUNNEL_TOKEN from password manager or Cloudflare dashboard (Zero Trust → Tunnels → exzentcg-homelab → Configure → copy token)
  6. NPM proxy host config is stored in /opt/edge-gateway/npm/data/ — if this was backed up, restore it. If not, recreate the proxy host for n8n.exzentcg.com manually (Step 8)

  7. Fix DNS:

    cat > /etc/resolv.conf <<'EOF'
    nameserver 1.1.1.1
    nameserver 1.0.0.1
    EOF
    chattr +i /etc/resolv.conf
    

  8. Start services and verify:

    cd /opt/edge-gateway
    docker compose up -d
    docker compose logs --tail 30 cloudflared
    # Look for "Registered tunnel connection" lines
    

  9. Recreate firewall rules — copy /etc/pve/firewall/101.fw from Phase 1 Actions.md Step 5.1

Rebuild n8n-app (CT 102)

  1. Delete and recreate the container following Implementation Guide Steps 4.1–4.4

  2. Restore config files:

  3. /opt/n8n/docker-compose.yml — copy from Implementation Guide Step 4.4
  4. /opt/n8n/.env — retrieve N8N_ENCRYPTION_KEY from password manager

  5. Restore n8n data:

  6. If /opt/n8n/data/ was backed up, restore it and chown -R 1000:1000 ./data
  7. If not backed up, n8n starts fresh — all workflows, credentials, and the owner account are lost. You must redo the setup wizard.

  8. Fix permissions and start:

    cd /opt/n8n
    chown -R 1000:1000 ./data
    docker compose up -d
    

  9. Recreate firewall rules — copy /etc/pve/firewall/102.fw from Phase 1 Actions.md Step 5.2


Scenario 3 — Proxmox host dies completely

Symptoms: Hardware failure, disk corruption, total loss.

Recovery time: ~2 hours

What you need: - A new machine (or repaired hardware) - Proxmox VE ISO (download from proxmox.com) - This Obsidian vault (stored on your laptop, not on the Proxmox host) - Access to your password manager

Steps:

  1. Install Proxmox VE fresh on the new hardware
  2. Set the management IP to 192.168.0.200 (or update all references)
  3. Install Tailscale: curl -fsSL https://tailscale.com/install.sh | sh && tailscale up
  4. Recreate Datacenter firewall IP sets (Step 1 of Implementation Guide)
  5. Enable Datacenter firewall (Step 2)
  6. Create node firewall rules (Step 0.2)
  7. Recreate CT 101 edge-gateway (Steps 3.1–3.4)
  8. Recreate CT 102 n8n-app (Steps 4.1–4.4)
  9. Apply container firewall rules (Step 5)
  10. Start cloudflared with stored tunnel token (Step 7.2–7.3)
  11. Recreate NPM proxy host (Step 8)
  12. Verify end-to-end: https://n8n.exzentcg.com

Warning

Cloudflare-side config (tunnel, Access policies, DNS) survives a host death. You do NOT need to recreate the tunnel, DNS records, or Access applications. Only the on-premises infrastructure needs rebuilding.


Scenario 4 — N8N_ENCRYPTION_KEY lost

Symptoms: n8n starts but all credential nodes show errors. Workflows that use stored API keys/tokens fail.

Recovery: There is no recovery. The key is AES-256 — without it, the encrypted credential blobs in n8n's SQLite database are unreadable.

Mitigation: 1. Re-enter every credential manually in n8n 2. Re-test every workflow that uses credentials 3. Generate a new encryption key and update .env:

openssl rand -hex 32
nano /opt/n8n/.env   # replace old key with new
docker compose restart n8n
4. This time, back up the key in two places


Scenario 5 — Cloudflare Tunnel token compromised

Symptoms: Someone has your tunnel token and could potentially route traffic through your tunnel.

Recovery:

  1. Go to Cloudflare Zero Trust → Networks → Tunnels → exzentcg-homelab
  2. Rotate the tunnel token (or delete and recreate the tunnel)
  3. Copy the new token
  4. Update on edge-gateway:
    pct enter 101
    cd /opt/edge-gateway
    nano .env   # replace TUNNEL_TOKEN value
    docker compose restart cloudflared
    docker compose logs --tail 30 cloudflared
    # Verify "Registered tunnel connection" appears
    
  5. If you recreated the tunnel, you also need to re-add the public hostname route and update the DNS CNAME

What How Frequency
Proxmox LXC snapshots pct snapshot <id> <name> After any significant change
n8n data directory tar czf /root/n8n-backup-$(date +%F).tgz /opt/n8n/data/ from CT 102 Weekly or before n8n updates
NPM data directory tar czf /root/npm-backup-$(date +%F).tgz /opt/edge-gateway/npm/ from CT 101 After proxy host changes
This Obsidian vault Git repo or cloud sync (OneDrive, etc.) Continuous
Password manager Cloud-synced (Bitwarden, 1Password) Continuous